BU  CS  TR98-006.  Appeared  in  Proc.  Computer  Animation,  Philadelphia,  PA,  June  1998. 


Active  Voodoo  Dolls:  A  Vision  Based  Input  Device  for  Nonrigid  Control 

John  Isidore  and  Stan  Sclaroff 
Computer  Science  Department 
Boston  University 
jisidoro@bu.edu,  sclaroff@bu.edu 


Abstract 

A  vision  based  technique  for  nonrigid  control  is  presented 
that  can  be  used  for  animation  and  video  game  applica¬ 
tions,  The  user  grasps  a  soft,  squishable  object  in  front  of 
a  camera  that  can  be  moved  and  deformed  in  order  to  spec¬ 
ify  motion.  Active  Blobs,  a  nonrigid  tracking  technique  is 
used  to  recover  the  position,  rotation  and  nonrigid  defor¬ 
mations  of  the  object  The  resulting  transformations  can  be 
applied  to  a  texture  mapped  mesh,  thus  allowing  the  user  to 
control  it  interactively.  Our  use  of  texture  mapping  hard¬ 
ware  in  tracking  makes  the  system  responsive  enough  for 
interactive  animation  and  video  game  character  control 

1  Introduction 

A  wide  variety  of  input  devices  have  been  invented  for 
human-computer  interaction.  One  category  of  devices  can 
transform  intentional  human  motion  into  a  measurable  ana¬ 
log  quantity.  Not  only  must  such  devices  be  accurate;  they 
must  also  be  intuitive  in  such  a  way  that  motion  of  the  de¬ 
vice  corresponds  directly  with  the  motion  that  the  human 
is  controlling. 

Basic  analog  joysticks  and  mice  have  two  degrees  of 
freedom,  which  correspond  to  the  screen's  x  and  y  axes. 
The  spaceorb  and  polhemus  provide  six  rigid  degrees  of 
freedom.  Haptic  feedback  pens,  data  gloves,  and  body  suits 
can  provide  even  more  than  six  degrees  of  freedom.  How¬ 
ever,  all  of  these  devices  have  the  limitation  that  they  can 
only  sense  rigid,  or  in  the  best  case,  articulated  motion.  Al¬ 
though  it  is  theoretically  possible  to  build  a  device  to  mea¬ 
sure  higher  degrees  of  freedom  in  motion  (bending,  twist¬ 
ing,  etc.),  vision  based  techniques  offer  an  inexpensive  al¬ 
ternative,  without  the  wiring  and  cumbersome  devices. 

In  this  paper,  we  present  a  vision  based  technique  for 
nonrigid  control.  The  user  grasps  a  soft  object  in  front  of  a 
camera  that  can  be  moved  and  deformed  in  order  to  spec¬ 
ify  motion.  Actiye  Blobs,  a  nonrigid  tracking  technique 
is  used  to  recover  the  object  position,  rotation,  and  defor¬ 
mations.  The  resulting  transformations  can  be  applied  to  a 
graphics  model,  thereby  allowing  the  user  to  control  mo¬ 
tion  interactively.  The  use  of  texture  mapping  hardware  in 
tracking  makes  our  system  responsive  enough  for  interac¬ 
tive  animation  and  video  game  character  control. 

sue  QUALITY  INSFBCISO  4 


2  Related  Work 

Sensing  motion  and  transforming  it  into  meaningful  data 
using  a  camera  is  most  easily  done  using  some  form  of 
tracking.  Most  previous  systems  that  use  vision  to  control 
graphics  have  tracked  face,  hand,  or  body  gestures. 

Some  previous  approaches  are  based  on  tracking  fea¬ 
ture  points  across  a  sequence  of  video  frames.  Williams 
[26]  tracked  iridescent  points  placed  on  the  face  of  the 
user.  The  points  were  used  to  pull  regions  of  a  triangu¬ 
lar  mesh  of  a  human  head.  This  approach  has  the  disad¬ 
vantage  that  the  dots  must  be  painted  or  pasted  onto  the 
user' s  face.  In  contrast,  Azarbayejani,  et  al  [1]  used  a  fea¬ 
ture  finding  algorithm  to  track  facial  points  and  converted 
them  into  three-dimensional  estimates  using  an  extended 
Kalman  Filter.  Motion  estimates  were  then  used  to  drive  a 
rigid  graphics  model  of  the  user's  head. 

Contour-based  approaches  have  also  been  proposed. 
For  instance,  Terzopoulos  and  Waters  [25]  used  snakes  to 
follow  lines  drawn  on  the  user's  face.  The  snakes  drive 
an  intricate  face  model  where  muscle  and  skin  are  physi¬ 
cally  simulated.  In  a  similar  approach,  Blake  and  Isard  em¬ 
ployed  contours  in  gesture  tracking[4],  allowing  the  user 
to  use  hand  motion  as  a  three-dimensional  mouse.  Rigid 
motion  supplies  the  position  of  the  mouse  while  nonrigid 
motion  supplies  button  pressing  and  releasing  actions. 

Other  researchers  [8,  10]  have  used  optic  flow  to  drive 
the  motion  of  an  anatomically-motivated  polyhedral  model 
of  the  face.  Optic  flow  has  also  been  used  to  capture  mo¬ 
tion  of  more  general  nonrigid  models  like  deformable  su¬ 
perquadrics  [17,  18],  In  a  different  approach,  histograms 
of  optic  flow  direction  can  be  used  directly  in  hand  gesture 
recognition  [11],  allowing  television  viewers  to  freely  use 
their  hand  like  a  mouse  to  make  menu  selections. 

Other  researchers  have  used  simple  low-level  image 
processing  techniques  such  as  normalized  image  corre¬ 
lation  [7],  or  background  subtraction  and  thresholding 
[5,  15,  16,  27]  for  tracking  moving  body  parts,  facial  fea¬ 
tures,  and  hand  gestures.  The  resulting  tracking  output  can 
be  used  to  drive  the  animation  of  a  face  model  [9],  to  drive 
manipulation  of  virtual  objects  [15],  or  to  interact  with  an¬ 
imated  creatures  in  a  virtual  world  [27,  16,  5].  Such  low- 
level  methods  are  surprisingly  useful  for  qualitative  mea¬ 
sures  {e.g.,  person  is  standing,  object  is  bending)  but  are  not 


19990823  032 


Figure  1:  Example  voodoo  doll  object  used  in  our  experiments 
(a),  and  a  deformable  color  region  on  the  object  used  for  track¬ 
ing  (b).  The  deformable  image  region  is  modeled  using  a  tex¬ 
ture  mapped  triangular  mesh.  The  underlying  triangular  model  is 
shown  in  (c). 


exact  enough  to  allow  quantitative  measures  of  fine-grain 
deformation  (e.g.,  how  much  bending  or  stretching).  Fur¬ 
thermore,  correlation-based  methods  are  not  well-suited  to 
tracking  moving  objects  that  undergo  scaling,  rotation,  or 
nonrigid  deformation,  since  tracking  is  accomplished  via 
correlation  with  a  translating  template. 

Larger  deformations  can  be  accomodated  if  the  problem 
is  posed  in  terms  of  deformable  image  registration  [2,  23]. 
In  particular,  La  Cascia,  et  al  [6],  use  this  approach  to  reg¬ 
ister  a  texture-mapped  polygonal  model  of  a  person' s  head 
with  an  incoming  video  sequence.  The  approach  allows 
tracking  of  a  wider  range  of  free  motion  of  the  head  than 
was  possible  using  flow  or  simple  correlation  methods. 

3  Approach 

In  our  system,  the  user  grabs  a  “voodoo  doll”  that  can  be  as 
simple  as  a  decorated  piece  of  foam  or  as  complex  as  a  toy 
stuffed  animal.  The  user  moves,  rotates,  and  squishes  the 
voodoo  doll  in  the  same  way  that  he  wants  to  manipulate 
the  corresponding  graphics  object  on  the  screen.  This  al¬ 
lows  the  intuitiveness  of  a  joystick  or  spaceorb,  while  pro¬ 
viding  more  degrees  of  freedom.  The  only  limitation  of  the 
voodoo  doll's  movement  is  that  it  must  remain  visible  to 
the  camera  at  all  times  in  order  to  follow  its  deformations. 

An  example  voodoo  doll  object  is  shown  in  Fig.  1(a). 
In  this  case,  the  object  is  simply  a  piece  of  foam  rubber 
painted  with  a  colored  grid  pattern. 

For  tracking  purposes,  we  build  a  two-dimensional  de¬ 
formable  texture  mapped  model  of  a  region  on  the  squishy 
object.  We  model  the  region  using  the  active  blob  for¬ 
mulation  [23].  Tracking  of  the  voodoo  doll’s  deformation 
-is  modeled  in  terms  of  a  deformable,  triangular  mesh  that 
captures  object  shape  plus  a  color  texture  map  that  captures 
object  appearance. 

As  shown  in  Fig.  1(b),  a  two-dimensional  active  blob 
model  is  constructed  for  our  example  using  a  modified  De¬ 
launay  triangulation  algorithm.  The  blob's  appearance  is 
then  captured  as  a  color  texture  map  and  applied  directly 
to  the  triangulated  model,  as  shown  in  Fig.  1(c). 

Nonrigid  motion  tracking  is  achieved  by  warping  this 
texture-mapped  blob  to  register  it  with  each  incoming 


Figure  2:  Example  animation  of  a  two-dimensional  character  via 
deformation  of  die  voodoo  doll  object  as  constmcted  in  Fig.  1. 
The  character  is  defined  as  a  polygonal  model  (a),  with  a  texture 
map  applied  (b).  By  directly  deforming  the  voodoo  doll  (c),  the 
user  can  control  the  deformation  of  the  character.  The  resulting 
deformation  of  the  character  is  shown  in  (d). 


video  frame.  A  blob  warp  is  defined  as  a  deformation 
of  the  mesh  and  then  a  bilinear  resampling  of  the  texture 
mapped  triangles.  By  defining  image  warping  in  this  way, 
it  is  possible  to  harness  hardware  accelerated  triangle  tex¬ 
ture  mapping  capabilities  becoming  prevalent  in  mid-end 
workstations,  and  PC's. 

An  example  of  using  the  voodoo  doll  to  control  defor¬ 
mation  of  a  two-dimensional  animated  character  is  shown 
in  Fig.  2.  In  this  case,  the  character  is  defined  as  a  polyg¬ 
onal  model  Fig.  2(a),  with  a  texture  map  applied  Fig.  2(b). 
By  directly  deforming  the  voodoo  doll  Fig.  2(c),  the  user 
can  control  the  deformation  of  the  character.  The  resulting 
deformation  of  the  character  is  shown  in  Fig.  2(d). 

The  same  deformations  used  to  warp  the  tracking  blob 
can  be  applied  to  a  completely  separate  graphics  object. 
As  was  shown  in  this  example,  the  approach  can  be  used 
to  warp  a  2D  model.  As  will  be  seen  later  in  this  paper,  the 
approach  can  also  be  used  to  control  the  deformation  of  a 
3D  polyhedral  model. 

By  deforming  the  voodoo  doll,  the  user  can  control  de¬ 
formation  of  the  graphics  model  with  the  benefit  of  haptic 
feedback.  Furthermore,  since  it  is  a  visual  interface,  it  is 
an  interface  that  is  unencumbered  by  wires. 

The  voodoo  doll  input  device  not  only  allows  for  trans¬ 
lation,  scaling,  and  rotation,  but  also  for  bending,  shearing, 
tapering,  and  other  nonrigid  deformations.  It  is  also  pos¬ 
sible  for  the  user  to  select  only  a  subset  of  deformation 
parameters  to  be  applied  to  the  character.  For  instance,  it 
is  possible  to  use  the  voodoo  doll  to  control  bending  of  the 
character,  while  leaving  rotation,  translation,  or  other  de¬ 
formation  parameters  fixed. 


2 


4  Tracking  Formulation:  Active  Blobs 

Tracking  is  accomplished  via  the  active  blobs  formulation. 
To  gain  better  robustness,  active  blobs  incorporate  infor¬ 
mation  about  not  only  shape,  but  also  color  image  ap¬ 
pearance.  Active  blobs  also  provide  some  robustness  to 
photometric  variations,  including  specular  highlights  and 
shadows.  By  taking  advantage  of  texture  mapping  hard¬ 
ware,  active  blobs  can  track  nonrigidly  deforming  shapes 
at  speeds  approaching  video  frame  rate.  We  will  now  give 
an  overview  of  the  active  blobs  formulation.  For  a  more 
detailed  formulation,  readers  are  refered  to  [23]. 

4.1  Blob  Warping 

In  the  active  blobs  formulation,  nonrigid  deformation  is 
controlled  by  parametric  functions.  These  functions  are 
applied  to  the  vertices  that  define  the  active  blob's  two- 
dimensional  trianglular  mesh: 

X'  =  /(X,u)  (1) 

where  u  is  a  vector  containing  deformation  parameters,  X 
contains  the  undeformed  mesh  vertex  locations,  and  X' 
contains  the  resulting  displaced  vertices. 

Image  warping  and  interpolation  are  accomplished  by 
displacing  the  mesh  vertices  and  then  resampling  using  bi¬ 
linear  interpolation.  Thus  we  define  a  warping  function  for 
an  input  image,  I: 


The  lowest  order  modes  correspond  with  rigid  degrees 
of  freedom.  The  higher-order  modes  correspond  qualita¬ 
tively  with  human  notions  of  nonrigid  deformation:  scal¬ 
ing,  stretching,  skewing,  bending,  etc.  Such  an  ordering 
of  shape  deformation  allows  us  to  select  which  types  of 
deformations  are  to  be  observed.  For  instance,  it  may  be 
desirable  to  make  tracking  rotation,  position,  and/or  scale 
independent.  To  do  this,  we  ignore  displacements  in  the 
appropriate  low-order  modes. 

4.2  Including  Illumination  in  the  Model 

Surface  illumination  may  vary  as  the  user  moves  and  ma¬ 
nipulates  the  active  voodoo  doll.  In  order  to  account  for 
this  we  can  derive  a  parameterization  for  modeling  bright¬ 
ness  and  contrast  variations: 


r  =  cW{I,u)+b,  (4) 

where  b  and  c  are  brightness  and  contrast  terms  that  are 
implemented  using  the  workstation's  image  blending  func¬ 
tionality. 

Blob  deformation  and  photometric  parameters  can  now 
be  combined  in  generic  parameter  vector  a.  The  image 
warping  function  becomes: 

I'  =  >V(I,a).  (5) 


I'  =  PF(I,u)  (2) 

where  u  is  a  vector  containing  warping  parameters,  and  V 
is  the  resulting  warped  image. 

Perhaps  the  simplest  warping  functions  to  be  used  are 
those  of  a  2D  affine  model  or  an  eight  parameter  projec¬ 
tive  model  [24].  Unfortunately,  these  functions  are  only 
suitable  for  approximating  the  rigid  motion  of  a  planar 
patch.  The  functions  can  be  extended  to  include  linear  and 
quadratic  polynomials  [2];  however,  the  extended  formula¬ 
tion  cannot  model  general  nonrigid  motion. 

A  more  general  parameterization  of  nonrigid  motion 
can  be  obtained  via  the  use  of  the  modal  representation 
[19].  In  the  modal  representation,  deformation  is  repre¬ 
sented  in  terms  of  eigenvectors  of  a  finite  element  (FE) 
model.  Taken  together,  modes  form  an  orthogonal  basis  set 
for  describing  nonrigid  shape  deformation.  Deformation  of 
a  triangle  mesh  can  be  expressed  as  the  linear  combination 
of  orthogonal  modal  displacements: 

m 

X'  =  X  +  ^(^j{ij  (3) 

j=i 

where  Uj  is  the  mode' s  parameter  value,  and  the  eigen¬ 
vector  defines  the  displacement  function  for  the 
modal  deformation. 


5  Tracking 

The  first  goal  of  our  system  is  nonrigid  shape  tracking.  To 
achieve  this,  the  system  recovers  warping  parameters  that 
register  a  template  image  Iq  with  a  stream  of  incoming 
video  images.  The  maximum  likelihood  solution  to  this 
two  image  registration  problem  consists  of  minimizing  the 
squared  error  for  all  the  pixels  within  the  blob: 


'image 


1  ^ 


i=l 


(6) 

(7) 


where  V{xi,yi)  is  a  pixel  in  the  warped  template  image  as 
prescribed  in  Eq.  5,  and  l{xi,yi)  is  the  pixel  at  the  same 
location  in  the  input.  The  above  equation  is  formulated  for 
comparing  two  color  images;  thus,  it  incorporates  the  sum 
of  squared  difference  over  all  channels  at  each  pixel. 


5.1  Robust  Registration 

Image  registration  can  be  corrupted  by  outliers.  The  pro¬ 
cess  can  be  made  less  sensitive  to  outliers  if  we  replace  the 
quadratic  error  norm  with  a  robust  error  norm  [13]: 


1  ^  ^ 

Eimage  —  ~ 

^  •  1 
1=1 


(8) 


3 


where  o*  is  a  scale  parameter  that  is  determined  based  on 
expected  image  noise.  If  it  is  assumed  that  noise  is  Gaus¬ 
sian  distributed,  then  the  optimal  error  norm  is  simply  the 
quadratic  norm  p(ei,cr)  =  e?/2<T^.  However,  robust¬ 
ness  to  outliers  can  be  further  improved  via  the  use  of  a 
Lorentzian  influence  function: 

p(ei,o-)  =log(l+ ^).  (9) 

This  norm  replaces  the  traditional  quadratic  norm  found  in 
least  squares,  and  is  equivalent  to  the  incorporation  of  an 
analog  outlier  process  in  our  objective  function  [3].  The 
formulation  results  in  better  robustness  to  specular  high¬ 
lights  and  occlusions.  For  efficiency,  the  log  function  can 
be  implemented  via  table  look-up. 

Equation  8  includes  a  data  term  only;  thus  it  only  en¬ 
forces  the  recovered  model’s  fidelity  to  the  image  mea¬ 
surements.  The  formulation  can  be  extended  to  include 
a  regularizing  term  that  enforces  the  priors  on  the  model 
parameters  a: 

-  n  m 

E  =  -^/5(ei,(T) (10) 

”  i=l  J=1 

where  where  are  the  penalties  associated  with  changing 
each  parameter,  and  7  is  a  constant  that  controls  the  relative 
importance  of  the  regularization  term.  The  penalties  can  be 
derived  directly  from  the  modal  stiffness  obtained  in  modal 
analysis  [19]. 

5.2  Difference  Decomposition 

To  minimize  the  energy  (Equation  10)  a  difference  decom- 
position  approach[12]  is  used.  The  approach  offers  the 
benefit  that  it  requires  the  equivalent  0(1)  image  gradient 
calculations  and  0{N)  image  products  per  iteration. 

In  the  difference  decomposition,  we  define  a  basis  of 
difference  images  generated  by  adding  small  changes  to 
each  of  the  blob  parameters.  Each  difference  image  takes 
the  form: 

b,-Io->V(Io,n,),  (11) 

where  Iq  is  the  template  image,  and  is  the  parameter 
displacement  vector  for  the  basis  image,  bj^.  Each  re¬ 
sultant  difference  image  becomes  a  column  in  a  difference 
decomposition  basis  matrix  B.  This  basis  matrix  can  be 
determined  as  a  precomputation. 

During  tracking,  an  incoming  image  I  is  inverse  warped 
into  the  blob's  coordinate  system  using  the  most  recent 
estimate  of  the  warping  parameters  a.  We  then  compute 
the  difference  between  the  inverse- warped  image  and  tem¬ 
plate: 

D  =  Io->V-Ml,a).  (12) 


This  difference  image  D  can  then  be  approximated  in 
terms  of  a  linear  combination  of  the  difference  decomposi¬ 
tion' s  basis  vectors: 

D  «  Bq,  (13) 

where  q  is  a  vector  of  basis  coefficients. 

Thus,  the  maximum  likelihood  estimate  of  q  can  be  ob¬ 
tained  via  least  squares: 

q  =  (B^B)-^B’’D.  (14) 

The  change  in  the  image  warping  parameters  is  obtained 
via  matrix  multiplication: 

Aa  =  Nq,  (15) 

where  N  has  columns  formed  by  the  parameter  displace¬ 
ment  vectors  n*  used  in  generating  the  difference  basis. 
The  basis  and  inverse  matrices  can  be  precomputed;  this  is 
the  key  to  the  difference  decomposition's  speed.  If  needed, 
this  minimization  procedure  can  be  iterated  at  each  frame 
until  the  percentage  change  in  the  error  residual  is  below  a 
threshold,  or  the  number  of  iterations  exceeds  some  maxi¬ 
mum. 

The  difference  decomposition  can  be  extended  to  incor¬ 
porate  the  robust  error  norm  of  Eq.  9: 

i>kixi,yi)  =  siga{hkixuyi))y/p{hkixi,yi),o-), 

(16) 

where  hk{xi^yi)  is  the  basis  value  at  the  pixel.  The 
difference  template  D  is  computed  using  the  same  formula. 
Finally,  the  formulation  can  be  extended  to  include  a  and 
a  regularizing  term  that  enforces  the  priors  on  the  model 
parameters  as  described  in  [23]. 

6  Using  Observed  Deformations  to  Animate 
Objects 

As  the  user  manipulates  the  voodoo  doll  input  device, 
we  want  corresponding  deformations  to  be  applied  to  the 
graphics  model.  The  appropriate  amounts  of  deformations 
to  be  applied  are  determined  through  tracking  the  deform¬ 
ing  active  blob  as  describe  above. 

We  could  apply  the  observed  deformations  directly  to 
the  computer  model.  However,  due  to  possible  differences 
in  scale,  orientation,  and  position,  we  need  to  first  allow 
the  user  to  define  the  appropriate  object-centered  coordi¬ 
nate  frame  and  bounding  box  for  the  animated  object.  This 
helps  to  establish  a  mapping  between  the  coordinate  spaces 
of  the  voodoo  doll  input  device  and  the  animated  model. 

The  voodoo  doll  and  the  graphics  model  may  not  have 
the  same  overall  shape  or  extent,  and  may  have  differ¬ 
ent  polygonal  sampling.  Therefore,  we  must  employ  an 
interpolation  scheme  for  mapping  deformations  from  the 


4 


voodoo  doll  input  device  onto  the  graphics  model.  To  be¬ 
gin  with,  we  position,  rotate,  and  scale  the  model  such  that 
its  bounding  box  is  the  same  as  the  voodoo  doll's.  Next 
we  use  the  doll's  finite  element  interpolants  to  obtain  the 
modal  displacement  vectors  of  Eq.  3: 

(17) 

where  is  the  interpolated  modal  displacement  vector  de¬ 
scribing  how  deformations  of  the  voodoo  doll  are  applied 
at  the  graphics  model' s  triangle  mesh  vertices  Xmodei  •  The 
FE  interpolation  matrix,  H,  is  formulated  using  Gaussian 
interpolants  as  specified  in  [22]. 

To  give  the  further  control  over  deformation,  we  allow 
the  user  to  specify  that  only  a  subset  of  the  observed  defor¬ 
mation  parameters  is  to  be  applied  to  the  graphics  model. 
For  instance,  it  is  possible  to  use  the  voodoo  doll  to  con¬ 
trol  bending  of  the  graphics  model,  while  leaving  rotation, 
translation,  and/or  other  deformation  parameters  fixed. 

7  Implementation  Details 

Blob  construction  starts  with  the  determination  of  a  sup¬ 
port  region  for  the  object  of  interest.  The  bounding  con- 
tour(s)  for  a  support  region  can  be  extracted  via  a  standard 
4-connected  contour  following  algorithm  [20].  Alterna¬ 
tively,  the  user  can  define  a  bounding  contour  for  a  region 
via  a  sketch  interface.  In  general,  the  number  of  contour 
segments  must  be  reduced.  We  utilize  the  tolerance  band 
approach,  where  the  merging  stage  can  be  iteratively  alter¬ 
nated  with  recursive  subdivision  [14],  In  practice,  a  single 
merging  pass  is  sufficient  for  a  user-sketched  boundary. 

The  triangles  are  then  generated  using  an  adaptation  of 
Ruppert’s  Delaunay  refinement  algorithm  [21].  The  algo¬ 
rithm  accepts  two  parameters  that  control  angle  and  trian¬ 
gle  size  constraints.  To  satisfy  these  constraints,  additional 
interior  vertices  may  be  added  to  the  original  polygon  dur¬ 
ing  mesh  generation.  The  source  code  is  available  from 
http://www.netlib.org/voronoi/. 

Once  a  triangle  mesh  has  been  generated,  a  RGB  color 
texture  map  is  extracted  from  the  example  image.  Each  tri¬ 
angle  mesh  vertex  is  given  an  index  into  the  texture  map 
that  corresponds  to  its  pixel  coordinate  in  the  undeformed 
example  image  Iq.  To  improve  convergence  and  noise  im¬ 
munity  in  tracking,  the  texture  map  is  blurred  using  a  Gaus¬ 
sian  filter.  Texture  map  interpolation  and  rendering  were 
accomplished  using  OpenGL. 

Given  a  triangle  mesh,  the  FE  model  can  be  initial¬ 
ized  using  Gaussian  interpolants  with  finite  support.  Due 
to  space  limitations,  readers  are  directed  to  [22]  for  the 
formulation.  The  generalized  eigenvectors  and  eigenval¬ 
ues  are  computed  using  code  from  the  EISPACK  library: 
http  ://www.netlib.org/eispack/. 

Finally,  difference  decomposition  basis  vectors  are  pre¬ 
computed.  In  practice,  four  basis  vectors  per  model  param¬ 


eter  are  sufficient.  For  each  parameter  ai,  these  four  basis 
images  correspond  with  the  difference  patterns  that  result 
by  tweaking  that  parameter  by  ±Si  and  ±2Si,  The  factor 
2Si  corresponds  to  the  maximum  anticipated  change  in  that 
parameter  per  video  frame. 

8  Experimental  Results 

Our  system  was  implemented  on  an  Indigo2  Impact  with 
a  195Mhz  RlOK  processor,  192MB  RAM,  and  hardware 
texture  mapping.  The  code  is  written  in  C++  and  all  tim¬ 
ings  are  reported  for  unoptimized  code.  In  our  interface, 
both  the  tracking  and  the  graphics  object  can  be  seen  at 
once  in  2  separate  256x256  true  color  windows. 

Fig.  3  shows  an  example  of  manipulation  of  a  two- 
dimensional  object.  The  user  grabs  a  decorated  piece  of 
foam  rubber  that  is  moved  and  deformed.  The  region  of 
the  foam  that  was  circled  by  the  user  is  used  to  track  the 
foam' s  movements. 

The  system  observes  the  user  manipulation  of  a  de¬ 
formable  spongy  object  via  the  camera.  Representative 
frames  from  such  a  video  sequence  are  shown  in  Fig.  3(a). 
The  system  then  tracks  the  motion  of  the  object  by  tracking 
a  deformable  region  as  shown  in  Fig.  3(b). 

The  recovered  deformation  parameters  are  then  applied 
to  the  computer  model  for  animation  as  shown  in  Fig.  3(c). 
Note  that  because  the  voodoo  doll  device  and  the  graphics 
model  have  different  object  coordinate  systems,  interpola¬ 
tion  and/or  extrapolation  of  the  deformations  was  used  (as 
described  in  Sec.  6). 

It  can  be  seen  that  the  bunny  character  is  following  the 
motion  of  the  foam  patch.  This  example  uses  14  deforma¬ 
tion  modes.  The  blob  used  for  tracking  contains  8503  pix¬ 
els  and  consists  of  72  points  with  112  triangles.  The  animal 
object  blob  contains  10432  pixels  and  consists  of  79  points 
with  118  triangles.  At  256x256  resolution  tracking  and 
deformation  occurred  at  interactive  framerates,  6.6  frames 
per  second  (fps).  Thus  there  was  no  delay  in  the  haptic  and 
visual  feedback  loop  to  the  user  controlling  the  deforma¬ 
tion.  At  128x128  resolution  the  same  example  runs  at  10.2 
fps  with  no  loss  in  tracking  quality. 

In  Fig.  4  a  3D  object  is  deformed.  In  our  interface,  the 
object  can  be  rotated  so  it  and  the  voodoo  doll  object  being 
manipulated  by  the  user  have  the  same  orientation  from  the 
users  point  of  view.  Doing  this  gives  the  user  the  illusion 
of  actually  touching  the  object  on  the  screen.  The  graph¬ 
ics  object  moves  just  like  the  real  object,  and  in  this  way, 
it  inherits  the  real  object's  physical  properties.  What  was 
once  a  rigid  cardboard  animal  cracker  box:,  now  seems  to 
be  made  of  soft  spongy  material. 

In  this  example,  20  modes  are  used  to  perform  the  track¬ 
ing.  The  tracking  blob  contains  17278  pixels  and  consists 
of  61  points  with  87  triangles.  The  3D  object  is  produced 
by  extruding  the  contour  of  the  polygonal  mesh  of  the  ani- 


5 


Figure  3:  Animating  a  bunny  character  via  direct  manipulation  of  a  spongy  object.  The  system  observes  the  user  manipulation  of  a 
deformable  spongy  object  via  a  camera;  representative  frames  from  such  a  video  sequence  are  shown  in  (a).  The  system  then  tracks  the 
motion  of  the  object  by  tracking  a  deformable  region  as  shown  in  (b).  The  recovered  deformation  parameters  are  then  applied  to  the 
graphics  model  for  animation  as  shown  in  (c). 


mal  cracker  blob.  It  consists  of  1 16  vertices  with  226  trian¬ 
gles.  At  256x256  resolution  the  system  runs  at  3.3  fps.  The 
larger  size  of  the  tracking  blob,  the  3D  rendering,  and  the 
extra  6  modes  make  the  system  slower  than  in  the  first  ex¬ 
ample.  However,  at  128x128  resolution  the  same  20  mode 
example  runs  at  8.2  ^s  using  the  same  blob  at  half  size 
(4335  pixels)  for  tracking. 

9  Conclusion 

The  Active  Blobs  formulation  allows  a  level  of  control 
rarely  seen  in  an  input  device.  Due  to  the  increasing  pop¬ 
ularity  of  personal  computer  video  cameras  and  hardware 
texture  mapping,  Active  Voodoo  Dolls  could  perhaps  be 
used  as  a  game  controller  for  home  use  in  a  few  years. 
Games  using  this  system  could  have  characters  capable 
of  bending  around  obstacles,  squishing  under  tables,  and 
slithering  around  corners,  with  a  level  of  control  unavail¬ 
able  using  conventional  input  devices.  The  main  character 
in  the  game  could  be  controlled  by  a  real  life  stuffed  animal 
likeness  of  it  being  moved  in  front  of  the  camera. 

Another  possibility  is  to  use  the  system  as  an  computer 
animation  tool.  A  animator  can  bend  a  real  object  the  way 
he  wants  his  synthesized  object  to  bend.  Because  the  phys¬ 
ical  object  is  actually  being  bent,  there  is  an  innate  tactile 
response  of  the  object.  Using  the  system  this  way  provides 
for  a  simple  and  inexpensive  form  of  haptics. 

One  of  the  limitations  of  this  system  is  that  the  track¬ 


ing  is  two-  dimensional,  and  all  depth  foreshortening  ef¬ 
fects  in  tracking  must  be  overcome  by  the  parametric  warp¬ 
ing  functions  used.  Complex  topologies  extending  along 
the  axis  of  the  camera  can  provide  difficulty  in  the  form 
of  non-homogenous  shading  effects  as  well  as  occlusion. 
However,  the  graphics  object  is  not  limited  to  be  two- 
dimensional,  and  can  be  three-dimensional  with  the  defor¬ 
mations  applied  to  two  dimensions.  This  gives  the  user  the 
feeling  of  manipulating  a  solid  object  on  the  screen. 

References 

[1]  A.  Azarbayejani,  T  Stamer,  B.  Horowitz,  and  A.P.  Pent- 
land.  Visually  controlled  graphics.  PAMl,  15(6):602-605, 
June  1993. 

[2]  M.  Black  and  Y.  Yacoob.  Tracking  and  recognizing  rigid 
and  non-rigid  facial  motions  using  local  parametric  models 
of  image  motion.  Proc.  ICCV,  1995. 

[3]  M.J.  Black  and  A.  Rangarajan.  On  the  unification  of  line 
processes,  outlier  rejection,  and  robust  statistics  with  appli¬ 
cations  in  early  vision.  IJCV,  19(1):57-91,  1996. 

[4]  A.  Blake  and  M.  Isard.  3D  position,  attitude  and  shape  input 
using  video  tracking  of  hands  and  lips.  Proc.  SIGGRAPH, 
1994. 

[5]  A.  Bobick.  Movement,  activity,  and  action:  The  role  of 
knowledge  in  the  perception  of  motion.  In  Proc.  Royal  Soc.: 
Special  Issue  on  Knowledge-based  Vision  in  Man  and  Ma¬ 
chine,  1997. 


6 


References 

[1]  K.  Bathe.  Finite  Element  Procedures  in  Engineering  Anal¬ 
ysis.  Prentice>Hall,  1982. 

[2]  A.  Baumberg  and  D.  Hogg.  Learning  flexible  models  from 
image  sequences.  In  Proc.  ECCV,  pp.  299-308,  1994. 

[3]  M.  Black  and  Y.  Yacoob.  Tracking  and  recognizing  rigid 
and  non-rigid  facial  motions  using  local  parametric  models 
of  image  motion.  In  Proc,  ICCV,  1995. 

[4]  M.J.  Black  and  A.  Rangarajan.  On  the  unification  of  line 
processes,  outlier  rejection,  and  robust  statistics  with  appli¬ 
cations  in  early  vision.  IJCV^  19(1):57-91,  1996. 

[5]  A.  Blake,  R.  Curwen,  and  A.  Zisserman.  A  framework  for 
spatiotemporal  control  in  the  tracking  of  visual  contours. 
IJCV,  11(2):127-146,  1993. 

[6]  A.  Blake  and  A.  Zisserman.  Visual  Reconstruction.  M.I.T. 
Press,  1987. 

[7]  T.  Boult,  S.  Fenster,  and  T.  O’Donnell.  Physics  in  a  fantasy 
world  vs.  robust  statistical  estimation,  in,  Object  Repre¬ 
sentation  in  Computer  Vision,  vol.  994  of  Lecture  Notes  in 
Computer  Science,  pp.  277-296.  Springer- Verlag,  1995. 

[8]  T.  Cootes.  Combining  point  distribution  models  with  shape 
models  based  on  finite  element  analysis.  In  Proc.  BMVC, 
1994. 

[9]  T.  Cootes,  D.  Cooper,  C.  Taylor,  and  J,  Graham.  Trainable 
method  of  parametric  shape  description.  Image  and  Vision 
Comp.,  10(5):289-294,  1992. 

[10]  J.  Duncan,  R.  Owen,  L.  Staib,  and  P.  Anandan.  Measure¬ 
ment  of  non-rigid  motion  using  contour  shape  descriptors. 
In  Proc.  CVPR,ipp.  318-324,  1991. 

[11]  S.  Geman  and  D.  Geman.  Stochastic  relaxation,  Gibbs  dis¬ 
tribution,  and  Bayesian  restoration  of  images.  PAMI,  6(1 1), 
1984. 

[12]  M.  Gleicher.  Projective  registration  with  difference  decom¬ 
position.  In  Proc.  CVPR,  pp.  331-337,  1997. 

[13]  G.D.  Hager  and  P.N,  Belhumeur.  Real  time  tracking  of  im¬ 
age  regions  with  changes  in  geometry  and  illumination.  In 
Proc.  CVPR,  pp.  403-410,  1996. 

[14]  F.  Hampel,  E.  Ronchetti,  P.  Rousseeuw,  and  W.  Stehel.  Ro¬ 
bust  Statistics:  The  Approach  Based  on  Influence  Func¬ 
tions.  John  Wiley,  1986. 

[15]  M.  Irani  and  S.  Peleg.  Improving  resolution  by  image  regis¬ 
tration.  CVGIP:  Graphical  Models  and  Image  Processing, 
53:231-239,  1991, 

[16]  A.K.  Jain,  Y.  Zhong,  and  S.  Lakshmanan.  Object  matching 
using  deformable  templates.  PAMI,  18(3):267-278, 1996. 

[17]  R.  Jain,  R.  Kasturi,  and  B.  Shunck.  Machine  Vision. 
McGraw-Hill,  1995. 

[18]  M.  Kass,  A.  Witkin,  and  D.  Terzopoulos.  Snakes:  Active 
contour  models.  IJCV,  1:321-331,  1987. 


[19]  G.  Kimeldorf  and  G.  Wahba.  A  correspondence  be¬ 
tween  Bayesian  estimation  and  on  stochastic  processes  and 
smoothing  by  splines.  An.  of  Math.  Stat.,  41(2):495-502, 
1970. 

[20]  J.  Martin,  A.  Pentland,  and  R.  Kikinis.  Shape  analysis  of 
brain  structures  using  physical  and  experimental  modes.  In 
Proc.  CVPR,  1994. 

[21]  C.  Nastar  and  A.  Pentland.  Matching  and  recognition  us¬ 
ing  deofmrable  intensity  surfaces.  In  Proc.  IEEE  Sym.  on 
Comp.  Vision,  1995. 

[22]  A.  Pentland  and  S.  Sclaroff.  Closed-form  solutions  for 
physically-based  shape  modeling  and  recognition.  PAMI, 
13(7):715-729,  1991. 

[23]  W.  Press,  Brian  Flannery,  S.  Teukolsky,  and  W.  Vetterling. 
Numerical  Recipes  in  C.  Cambridge  U.  Press,  1988. 

[24]  A.  Rosenfeld  and  A.  Kak.  Digital  Picture  Processing.  Aca¬ 
demic  Press,  1976. 

[25]  J.  Ruppert.  A  Delaunay  refinement  algorithm  for  quality 
2-dimensional  mesh  generation.  J.  of  Algs.,  18(3):548- 
585,1995. 

[26]  S.  Sclaroff.  Modal  Matching:  A  Method  for  Describing, 
Comparing,  and  Manipulating  Digital  Signals.  PhD  thesis, 
MIT  Media  Lab,  1995. 

[27]  S.  Sclaroff  and  A.  Pentland.  Physically-based  combinations 
of  views:  Representing  rigid  and  nonrigid  motion.  In  Proc. 
IEEE  Workshop  on  Nonrigid  and  Articulate  Motion,  1994. 

[28]  S.  Sclaroff  and  A.  Pentland.  Modal  Matching  for  Corre¬ 
spondence  and  Recognition.  PAMI,  17(6):545-561,  1995. 

[29]  J.  R.  Shewchuk.  Triangle:  Engineering  a  2D  quality  mesh 
generator  andTlelaunay  triangulator.  In  Proc.  ACM  Work¬ 
shop  onApp.  Comp.  Geom.,  pp.  124-133,  1996. 

[30]  R.  Szeliski.  Bayesian  Modeling  of  Uncertainty  in  Low-Level 
Vision.  Kluwer,  1989, 

[31]  R.  Szeliski,  Video  mosaics  for  virtual  environments,  IEEE 
CG&A,  16(2):22-30,  1996. 

[32]  R.  Szeliski  and  J.  Coughlan.  Hierarchical  spline-based  im¬ 
age  registration.  In  Proc.  CVPR,  pp.  194-201,  1994. 

[33]  D.  Terzopoulos.  Regularization  of  inverse  visual  problems 
involving  discontinuities,  PAMI,  8(4):41 3-424, 1986. 

[34]  D.  Terzopoulos.  On  matching  deformable  models  to  im¬ 
ages:  Direct  and  iterative  solutions.  In  Topical  Meeting  on 
Machine  Vision,  vol.  12  of  Tech.  Digest  Series,  pp.  160-167, 
1987.  Optical  Soc.  of  America. 

[35]  L.  Williams.  Pyramidal  Parametrics.  Comp.  Graphics, 
17(3):  1-11, 1983. 

[36]  A.  Yuille,  D.  Cohen,  and  P.  Hallinan.  Feature  extraction 
from  faces  using  deformable  templates.  In  Proc.  CVPR, 
pp.  104-109,  1989. 


8 


0MB  No,  0704-0188 


Put><!c  resorting  bureen  tor  this  collenion  oi  informsiien  rs  estimated  to  average  l  hour  perresporse,  including  the  time  for  reviewing  instruaions.  searching  existing  data  sources, 
■gaihenng  and  maintaining  the  data  rteeded.  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  asoect  of  this 


1 


3.  REPORT  TYPE  AND  DATES  COVERED 


Dsvu  htehrtay.  Suit«  1204.  Arlington.  VA  22202-4302.  and  to  the  Off  ice  of  Management  and  Bufl 

1.  AGENCY  USE  ONLY  (Leave  blank) 

2.  REPORT  DATE 

August  1998 

4.  TITLE  AND  SUBTITLE  J 

Active  Voodoo  dolls;  A  Vision  Based  Input  Device  for 
Nonrigid  Control 


5.  FUNDING  NUMBERS 

G  N00014-96-1-0&61 


6.  AUTHOR(S) 

John  Isidoro  and  Stan  Sclaroff 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRES5(ES) 

Computer  Science  Department 
Boston  University 
111  Gumming ton  Street 
Boston,  M  02215 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

Department  of  the  Navy 
Office  of  Naval,  Research 
Ballston -Centre  Tower  One 
800  North  Quincy  Street 
Arlington,  VA  22 


11.  SUPPLEMENTARY  NOTES 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 

sclaroff-ONR- 

TR98-006 


10.  SPONSORING /MONITORING 
AGENCY  REPORT  NUMBER 


12a.  DISTRIBUTION /AVAILABILITY  STATEMENT 


12b.  DISTRIBUTION  CODE 


Approved  for  public  release. 


13.  ABSTRACT  (Max/mt/m  200  wordSj) 

A  vision  based  technique  for  nonrigid  control  is  presented  that  can  be  used  for 
animation  and  video  game  applications.  The  user  grasps  a  soft,  squishible 
Object  in  front  of  a  camera  that  can  be  moved  and  deformed  in  order  to  specify 
motion.  Active  blobs',  a  nonrigid  tracking  technique  is  used  to  recover  the 
position,  rotation  and  nonrigid  deformations  of  the.  object.  The  resulting 
transformations  can  be  applied  to  a  texture  mapped  mesh,  thus  allowing  the  user 
to  control  it  interactively.  Our  use  of  texture  mapping  hardware  in  tracking 
makes  the  system  responsive  enough  for  interactive  animation  and  video  game 
character  control. 


14.  SUBJECT  TERMS 

Nonrigid  shape,  video  database,  motion  description,  and 
recognition 


IS.  NUMBER  OF  PAGES 

\  8 


16.  PRICE  CODE 


17.  SECURITY  CUSSIFICATION 
OF  REPORT 

unclassified 


NSN  7540-01-280-5500 


18.  SECURITY  CLASSIFICATION 
OF  THIS  PAGE 

unclassified 


19.  SECURITY  CLASSIF.ICATJO.N-120.  LIMITATION  Of  ABSTRACT 
OF  ABSTRACT  I  - 


unclassified 


Standard  Form  298  (Rev.  2-89) 
Pfwcntwd  bi'.  ANSI  Sio  239-18 


